168 research outputs found

    ω\omegaTest: WebView-Oriented Testing for Android Applications

    Full text link
    WebView is a UI widget that helps integrate web applications into the native context of Android apps. It provides powerful mechanisms for bi-directional interactions between the native-end (Java) and the web-end (JavaScript) of an Android app. However, these interaction mechanisms are complicated and have induced various types of bugs. To mitigate the problem, various techniques have been proposed to detect WebView-induced bugs via dynamic analysis, which heavily relies on executing tests to explore WebView behaviors. Unfortunately, these techniques either require manual effort or adopt random test generation approaches, which are not able to effectively explore diverse WebView behaviors. In this paper, we study the problem of test generation for WebViews in Android apps. Effective test generation for WebViews requires identifying the essential program properties to be covered by the generated tests. To this end, we propose WebView-specific properties to characterize WebView behaviors, and devise a cross-language dynamic analysis method to identify these properties. We develop ω\omegaTest, a test generation technique that searches for event sequences covering the identified WebView-specific properties. An evaluation on 74 real-world open-/closed-source Android apps shows that ω\omegaTest can cover diverse WebView behaviors and detect WebView-induced bugs effectively. ω\omegaTest detected 36 previously-unknown bugs. From the 22 bugs that we have reported to the app developers, 13 bugs were confirmed, 9 of which were fixed.Comment: Accepted by the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023

    Fuzzing Deep Learning Compilers with HirGen

    Full text link
    Deep Learning (DL) compilers are widely adopted to optimize advanced DL models for efficient deployment on diverse hardware. Their quality has profound effect on the quality of compiled DL models. A recent bug study shows that the optimization of high-level intermediate representation (IR) is the most error-prone compilation stage. Bugs in this stage are accountable for 44.92% of the whole collected ones. However, existing testing techniques do not consider high-level optimization related features (e.g. high-level IR), and are therefore weak in exposing bugs at this stage. To bridge this gap, we propose HirGen, an automated testing technique that aims to effectively expose coding mistakes in the optimization of high-level IR. The design of HirGen includes 1) three coverage criteria to generate diverse and valid computational graphs; 2) full use of high-level IRs language features to generate diverse IRs; 3) three test oracles inspired from both differential testing and metamorphic testing. HirGen has successfully detected 21 bugs that occur at TVM, with 17 bugs confirmed and 12 fixed. Further, we construct four baselines using the state-of-the-art DL compiler fuzzers that can cover the high-level optimization stage. Our experiment results show that HirGen can detect 10 crashes and inconsistencies that cannot be detected by the baselines in 48 hours. We further validate the usefulness of our proposed coverage criteria and test oracles in evaluation

    Programming by Example Made Easy

    Full text link
    Programming by example (PBE) is an emerging programming paradigm that automatically synthesizes programs specified by user-provided input-output examples. Despite the convenience for end-users, implementing PBE tools often requires strong expertise in programming language and synthesis algorithms. Such a level of knowledge is uncommon among software developers. It greatly limits the broad adoption of PBE by the industry. To facilitate the adoption of PBE techniques, we propose a PBE framework called Bee, which leverages an "entity-action" model based on relational tables to ease PBE development for a wide but restrained range of domains. Implementing PBE tools with Bee only requires adapting domain-specific data entities and user actions to tables, with no need to design a domain-specific language or an efficient synthesis algorithm. The synthesis algorithm of Bee exploits bidirectional searching and constraint-solving techniques to address the challenge of value computation nested in table transformation. We evaluated Bee's effectiveness on 64 PBE tasks from three different domains and usability with a human study of 12 participants. Evaluation results show that Bee is easier to learn and use than the state-of-the-art PBE framework, and the bidirectional algorithm achieves comparable performance to domain-specifically optimized synthesizers.Comment: Accepted by ACM Transactions on Software Engineering and Methodolog

    MEMO: Coverage-guided Model Generation For Deep Learning Library Testing

    Full text link
    Recent deep learning (DL) applications are mostly built on top of DL libraries. The quality assurance of these libraries is critical to the dependable deployment of DL applications. A few techniques have thereby been proposed to test DL libraries by generating DL models as test inputs. Then these techniques feed those DL models to DL libraries for making inferences, in order to exercise DL libraries modules related to a DL model's execution. However, the test effectiveness of these techniques is constrained by the diversity of generated DL models. Our investigation finds that these techniques can cover at most 11.7% of layer pairs (i.e., call sequence between two layer APIs) and 55.8% of layer parameters (e.g., "padding" in Conv2D). As a result, we find that many bugs arising from specific layer pairs and parameters can be missed by existing techniques. In view of the limitations of existing DL library testing techniques, we propose MEMO to efficiently generate diverse DL models by exploring layer types, layer pairs, and layer parameters. MEMO: (1) designs an initial model reduction technique to boost test efficiency without compromising model diversity; and (2) designs a set of mutation operators for a customized Markov Chain Monte Carlo (MCMC) algorithm to explore new layer types, layer pairs, and layer parameters. We evaluate MEMO on seven popular DL libraries, including four for model execution (TensorFlow, PyTorch and MXNet, and ONNX) and three for model conversions (Keras-MXNet, TF2ONNX, ONNX2PyTorch). The evaluation result shows that MEMO outperforms recent works by covering 10.3% more layer pairs, 15.3% more layer parameters, and 2.3% library branches. Moreover, MEMO detects 29 new bugs in the latest version of DL libraries, with 17 of them confirmed by DL library developers, and 5 of those confirmed bugs have been fixed.Comment: 11 pages, 8 figure

    Towards Modeling Software Quality of Virtual Reality Applications from Users' Perspectives

    Full text link
    Virtual Reality (VR) technology has become increasingly popular in recent years as a key enabler of the Metaverse. VR applications have unique characteristics, including the revolutionized human-computer interaction mechanisms, that distinguish them from traditional software. Hence, user expectations for the software quality of VR applications diverge from those for traditional software. Investigating these quality expectations is crucial for the effective development and maintenance of VR applications, which remains an under-explored area in prior research. To bridge the gap, we conduct the first large-scale empirical study to model the software quality of VR applications from users' perspectives. To this end, we analyze 1,132,056 user reviews of 14,150 VR applications across seven app stores through a semiautomatic review mining approach. We construct a taxonomy of 12 software quality attributes that are of major concern to VR users. Our analysis reveals that the VR-specific quality attributes are of utmost importance to users, which are closely related to the most unique properties of VR applications like revolutionized interaction mechanisms and immersive experiences. Our examination of relevant user complaints reveals the major factors impacting user satisfaction with VR-specific quality attributes. We identify that poor design or implementation of the movement mechanisms, control mechanisms, multimedia systems, and physics, can significantly degrade the user experience. Moreover, we discuss the implications of VR quality assurance for both developers and researchers to shed light on future work. For instance, we suggest developers implement sufficient accessibility and comfort options for users with mobility limitations, sensory impairments, and other specific needs to customize the interaction mechanisms. Our datasets and results will be released to facilitate follow-up studies
    • …
    corecore